Ranking Biomedical Passages for Relevance and Diversity: University of Wisconsin, Madison at TREC Genomics 2006

نویسندگان

  • Andrew B. Goldberg
  • David Andrzejewski
  • Jurgen Van Gael
  • Burr Settles
  • Xiaojin Zhu
  • Mark Craven
چکیده

We report on the University of Wisconsin, Madison’s experience in the TREC Genomics 2006 track, which asks participants to retrieve passages from scientific articles that satisfy biologists’ information needs. An emphasis is placed on returning relevant passages that discuss different aspects of the topic. Using an off-the-shelf information retrieval (IR) engine, we focused on query generation and reranking query results to encourage relevance and diversity. For query generation, we automatically identify noun phrases from the topic descriptions, and use online resources to gather synonyms as expansion terms. Our first submission uses the baseline IR engine results. We rerank the passages using a näıve clustering-based approach in our second run, and we test GRASSHOPPER, a novel graph-theoretic algorithm based on absorbing random walks, in our third run. While our aspect-level results appear to compare favorably with other participants’ on average, our query generation techniques failed to produce adequate query results for several topics, causing our passage and document-level evaluation scores to suffer. Furthermore, we surprisingly achieved higher aspect-level scores using the initial ranking than our methods aimed specifically at promoting diversity. While this sounds discouraging, we have several ideas as to why this happened and hope to produce new methods that correct these shortcomings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ASU at TREC 2006 Genomics Track

This paper describes our experiments in the TREC 2006 Genomics track submitted by the ASU BioAI group, as well as experiments based on the improvements made after our submission. Some of the major issues we tried to address in our experiments are how to (1) extract keywords from natural language questions in the biomedical domain and (2) determine the relevancy of passages.

متن کامل

Combining Vector-Space and Word-Based Aspect Models for Passage Retrieval

This report summarizes the work done at Kyoto University and the University of Melbourne for the TREC 2006 Genomics Track. The single task for this year was to retrieve passages from a biomedical document collection. We devised a system made of two parts to deal with this problem. The first part was an existing IR system based on the vector-space model. The second part was a newly developed pro...

متن کامل

Combining Lexicon Expansion, Information Retrieval, and Cluster-based Ranking for Biomedical Question Answering

The Oregon Health & Science University submission to the TREC 2006 Genomics Track approached the question answer extraction task in three phases. In the first phase the biological questions were parsed into relevant entities and query expressions were generated. The second phase retrieved relevant passages from the corpus using Lucene as an information retrieval engine. The third phase performe...

متن کامل

Concept Recognition, Information Retrieval, and Machine Learning in Genomics Question-Answering

TREC Genomics 2006 presented a genomics question-answering challenge with questions on twenty-seven topics, and a corpus of 162,259 full-text biomedical journal articles from which to derive answers. Questions were formulated from actual information needs of biomedical researchers, and performance was based on human evaluation of the answers. The University of Colorado approach to this task inv...

متن کامل

DUTIR at TREC 2006: Genomics and Enterprise Tracks

This paper describes the techniques we applied for the two TREC 2006 tracks, i.e., Genomics and Enterprise track. For the Genomics Track, we used a Rocchio relevance feedback method to expand the terms and then performed passage retrieval by building dual index and using half overlapped windows passages. Several approaches to merge the results and rerank the passages are presented. For the Ente...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006